Experimental Estimation of Number of Clusters Based on Cluster Quality

نویسندگان

  • G. Hannah Grace
  • Kalyani Desikan
چکیده

Text Clustering is a text mining technique which divides the given set of text documents into significant clusters. It is used for organizing a huge number of text documents into a well-organized form. In the majority of the clustering algorithms, the number of clusters must be specified apriori, which is a drawback of these algorithms. The aim of this paper is to show experimentally how to determine the number of clusters based on cluster quality. Since partitional clustering algorithms are well-suited for clustering large document datasets, we have confined our analysis to a partitional clustering algorithm.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Target Detection Improvements in Hyperspectral Images by Adjusting Band Weights and Identifying end-members in Feature Space Clusters

          Spectral target detection could be regarded as one of the strategic applications of hyperspectral data analysis. The presence of targets in an area smaller than a pixel’s ground coverage has led to the development of spectral un-mixing methods to detect these types of targets. Usually, in the spectral un-mixing algorithms, the similar weights have been assumed for spectral bands. Howe...

متن کامل

Application of a Self-Organizing Map for Clustering the Groundwater Quality in Kerman Province and Assessment its Suitability for Drinking and Irrigation Purposes

Evaluation of groundwater hydro chemical characteristics is necessary for planning and water resources management in terms of quality. In the present study, a self-organizing map (SOM) clustering technique was used to recognize the homogeneous clusters of hydro chemical parameters in water resources (including well, spring and qanat) of Kerman province; then, the quality classification of groun...

متن کامل

Estimation of genetic diversity in rice (Oryza sativa L.) genotypes using SSR markers under salinity stress . Fatemeh Gholizadeh1* and Saeed Navabpour2

In order to study the genetic diversity in rice (Oryza sativa L.), 29 genotypes consisting land races, pure and improved lines were evaluated using simple sequence repeat (SSR) markers. A total of 30 SSR primers were used to amplify some part of rice genome in germplasms, the PIC values ranged from 0.07 (RM 340) to 0.71 (RM 7426) with an average of 0.45. The results showed a total number of 106...

متن کامل

Simulation of Fabrication toward High Quality Thin Films for Robotic Applications by Ionized Cluster Beam Deposition

The most commonly used method for the production of thin films is based on deposition of atoms or molecules onto a solid surface. One of the suitable method is to produce high quality metallic, semiconductor and organic thin film is Ionized cluster beam deposition (ICBD), which are used in electronic, robotic, optical, optoelectronic devices. Many important factors such as cluster size, cluster...

متن کامل

KD-Tree Based Clustering for Gene Expression Data

K-means is one of the widely researched clustering algorithms. But it is sensitive to the selection of initial cluster centers and estimation of the number of clusters. In this chapter, we propose a novel approach to find the efficient initial cluster centers using kd-tree and compute the number of clusters using joint distance function. We have carried out excessive experiments on various synt...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1503.03168  شماره 

صفحات  -

تاریخ انتشار 2014